In this project, I explore the application of Meta’s Prophet forecasting system to analyze time series data. Prophet is an open-source forecasting tool developed by Meta that is designed to handle time series with strong seasonal effects and multiple seasons of historical data.
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, including holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data.
The model equation in Prophet can be represented as:
\[y(t) = g(t) + s(t) + h(t) + \epsilon_t\]
Where: - \(g(t)\) is the trend component - \(s(t)\) is the seasonality component - \(h(t)\) is the holiday effect - \(\epsilon_t\) is the error term
For this analysis, I will be using the CO2 dataset available in R, which contains atmospheric carbon dioxide measurements from the Mauna Loa Observatory in Hawaii from 1959 to 1997. This dataset is particularly interesting because:
Let’s first load the necessary libraries and examine the data:
Let’s first examine the CO2 dataset structure and visualize the raw data:
## [1] "ts"
## Time-Series [1:468] from 1959 to 1998: 315 316 316 318 318 ...
## [1] 1959 1
## [1] 1997 12
## [1] 12
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 313.2 323.5 335.2 337.1 350.3 366.8
# Plot the raw time series
plot(co2, main="Monthly Atmospheric CO2 Concentration",
ylab="CO2 (ppm)", xlab="Year")Before applying Prophet, basic analysis is performed to understand the characteristics of the CO2 time series:
# Calculate growth rate
annual_growth <- 100 * mean(diff(co2, lag=12), na.rm=TRUE) / mean(co2)
cat("Average annual growth rate:", round(annual_growth, 2), "%\n")## Average annual growth rate: 0.37 %
# Decompose the time series to see trend, seasonal, and random components
co2_decomp <- decompose(co2)
plot(co2_decomp)# Linear trend analysis
time_index <- 1:length(co2)
linear_model <- lm(co2 ~ time_index)
summary(linear_model)##
## Call:
## lm(formula = co2 ~ time_index)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0399 -1.9476 -0.0017 1.9113 6.5149
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.115e+02 2.424e-01 1284.9 <2e-16 ***
## time_index 1.090e-01 8.958e-04 121.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.618 on 466 degrees of freedom
## Multiple R-squared: 0.9695, Adjusted R-squared: 0.9694
## F-statistic: 1.479e+04 on 1 and 466 DF, p-value: < 2.2e-16
# Visualize with trend line
plot(co2, main="CO2 with Linear Trend", ylab="CO2 (ppm)", xlab="Year")
abline(linear_model, col="red")Prophet requires a specific format for input data - a dataframe with columns ‘ds’ (date) and ‘y’ (value):
# Convert ts object to dataframe for Prophet
co2_df <- data.frame(
ds = zoo::as.yearmon(time(co2)),
y = as.numeric(co2)
)
# Display the first few rows
head(co2_df)## ds y
## 1 Jan 1959 315.42
## 2 Feb 1959 316.31
## 3 Mar 1959 316.50
## 4 Apr 1959 317.56
## 5 May 1959 318.13
## 6 Jun 1959 318.00
Now let’s implement Prophet:
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
# Create future dataframe for forecasting (8 quarters ahead)
future <- make_future_dataframe(model, periods=8, freq="quarter")
# Make predictions
forecast <- predict(model, future)
# Display forecast components
head(forecast[, c("ds", "yhat", "yhat_lower", "yhat_upper")])## ds yhat yhat_lower yhat_upper
## 1 1959-01-01 315.2778 314.8370 315.7617
## 2 1959-02-01 316.0343 315.5398 316.4986
## 3 1959-03-01 316.7482 316.2918 317.2268
## 4 1959-04-01 318.0628 317.6105 318.5223
## 5 1959-05-01 318.7045 318.2703 319.1653
## 6 1959-06-01 318.1195 317.6555 318.5570
Let’s explore how different seasonality settings affect the forecast:
# Model with multiplicative seasonality
model_mult <- prophet(co2_df,
seasonality.mode = "multiplicative")## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
The trend flexibility can be adjusted by modifying the changepoint parameters:
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
forecast_flexible <- predict(model_flexible, future)
# Compare trends
plot(model_flexible, forecast_flexible)Let’s create an interactive visualization of our forecast:
library(plotly)
# Create interactive plot
plot_ly() %>%
add_lines(x = ~co2_df$ds, y = ~co2_df$y,
name = "Historical", line = list(color = "black")) %>%
add_lines(x = ~forecast$ds, y = ~forecast$yhat,
name = "Forecast", line = list(color = "blue")) %>%
add_ribbons(x = ~forecast$ds,
ymin = ~forecast$yhat_lower,
ymax = ~forecast$yhat_upper,
name = "Uncertainty",
fillcolor = "rgba(0, 0, 255, 0.2)") %>%
layout(title = "CO2 Concentration Forecast",
xaxis = list(title = "Date"),
yaxis = list(title = "CO2 (ppm)"))Let’s compare the different models we’ve created:
# Extract and combine prediction results
results <- data.frame(
Date = tail(forecast$ds, 8),
Basic = tail(forecast$yhat, 8),
Multiplicative = tail(forecast_mult$yhat, 8),
Flexible = tail(forecast_flexible$yhat, 8)
)
# Display comparison table
knitr::kable(results, caption = "Forecast Comparison for Next 8 Quarters")| Date | Basic | Multiplicative | Flexible |
|---|---|---|---|
| 1998-03-01 | 365.9377 | 366.0603 | 365.9533 |
| 1998-06-01 | 367.3121 | 367.5168 | 367.3280 |
| 1998-09-01 | 362.2602 | 362.0111 | 362.2773 |
| 1998-12-01 | 364.7259 | 364.6641 | 364.7419 |
| 1999-03-01 | 367.2811 | 367.4018 | 367.3031 |
| 1999-06-01 | 368.7574 | 368.9770 | 368.7780 |
| 1999-09-01 | 363.7058 | 363.4474 | 363.7284 |
| 1999-12-01 | 366.1395 | 366.0765 | 366.1605 |
# Calculate metrics for in-sample fit
# Use the last 12 data points as a test set
train_df <- head(co2_df, nrow(co2_df) - 12)
test_df <- tail(co2_df, 12)
# Train models on training data
model_train <- prophet(train_df)## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
future_train <- make_future_dataframe(model_train, periods=12, freq="month")
forecast_train <- predict(model_train, future_train)
# Calculate RMSE
forecast_values <- tail(forecast_train$yhat, 12)
actual_values <- test_df$y
rmse <- sqrt(mean((forecast_values - actual_values)^2))
cat("Root Mean Square Error (RMSE):", round(rmse, 3), "ppm\n")## Root Mean Square Error (RMSE): 0.626 ppm
# Calculate MAPE
mape <- 100 * mean(abs((forecast_values - actual_values) / actual_values))
cat("Mean Absolute Percentage Error (MAPE):", round(mape, 3), "%\n")## Mean Absolute Percentage Error (MAPE): 0.147 %
Based on the analysis and forecasting results, we can draw several insights:
Long-term Trend: The CO2 concentration shows a clear upward trend over time, increasing at an average rate of 0.37% per year.
Seasonality: There is a distinct seasonal pattern in the CO2 data, with peaks typically occurring in March, April and May and troughs in October and November.
Forecast Accuracy: The Prophet model achieved a RMSE of 0.626 ppm and a MAPE of 0.147%.
Future Projections: Based on the forecast, CO2 levels are expected to reach 366 ppm by the year 2000, continuing the concerning upward trend.
In this project, I demonstrated the application of Meta’s Prophet forecasting system to analyze atmospheric CO2 concentration data. The Prophet model effectively captured both the long-term trend and seasonal patterns in the data, providing valuable insights and accurate forecasts.
The results highlight the continued increase in atmospheric CO2 levels, which has significant implications for climate change and environmental policy. The forecasting approach demonstrated here could be applied to other environmental time series to support decision-making and planning.
Prophet proved to be a powerful and flexible tool for time series forecasting, allowing for easy incorporation of seasonal patterns and trend changes with almost no manual configuration.
?co2